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CREATION OF VARIABLE LENGTH AND SEQUENCE LINKER 
REGIONS FOR DUAL-DOMAIN OR MULTI-DOMAIN 

MOLECULES 

5 FIELD OF THE INVENTION 

This invention in the field of molecular biology relates to libraries of dual- 
domain nucleic acids and/or proteins in which the domains are joined by a library of 
linkers that vary in length and sequence. 

10 BACKGROUND OF THE INVENTION 

Dual-domain polypeptides or dual-domain nucleic acids encoding such 
polypeptides may have new, advantageous properties compared to the original 
polypeptides or nucleic acids after which they are patterned. Such polypeptide domains 
are generally linked using a linker region or linker domain. A generic designation of 

15 such a polypeptide construct is D1-L-D2, wherein D] and D2 are two structural domains 

that are identical or different and L is the linker. For example, two cytosolic domains of 
the membrane- spanning protein adenylyl cyclase coupled with a linker domain form a 
soluble protein (Tang et al, Science, 268: 1769-1772 (1995)). An advantage of this 
soluble form of adenylyl cyclase, which retains enzymatic activity, is that it can be 

20 produced in much higher quantities than the native enzyme (Dessauer et al. , J. Biol. 

Chem., 16967-16974(1996)). 

Another type of polypeptide generated by linking two domains is a single chain 
antibody or scFv. These single chain polypeptides include the variable (V) regions from 
the heavy (H) and light(L) chains of a selected immunoglobulin (Ig) and recreate the 

25 antigen binding site of the native Ig while being a fraction of its size (Skerra, A. et al. 

(1988) Science, 240: 1038-1041; Pluckthun, A. et al. (1989) Methods Enzymol. 178: 
497-515; Winter, G. et al. (1991) Nature, 349: 293-299); Bird et al. (1988) Science 
242:423; Huston et al. (1988) Proc. Natl. Acad. Sci. USA 85:5879; U.S. Patents No. 
4,704,692, 4,853,871, 4,946,778, 5,260,203, 5,455,030. A number of U.S. patents and 

30 international patent publications of J. Huston and colleagues describe various two chain 



DC043399 



1 



Dkt: LSB-006 



or two domain proteins, including single chain antibodies, joined by linker peptides and 
optionally including cleavable sites (U.S. Patents No. 5888773, 5877305, 5861 156, 
5837846, 5753204, 5534254, 5525491, 5482858, 5476786, 5330902, 5302526, 
5258498, 5132405, 5091513, 5013653, WO 9323537A1 (25-NOV-1993) 
5 An scFv is composed of a V H domain at its N-terminus and a V L domain at its C- 

terminus (or vice versa) linked by a peptide linker. Correct folding of the V H and V L 
regions is crucial for retention of antigen binding capacity by the scFv. The length and 
sequence of the linker region are critical parameters for correct folding and biological 
function. scFv chains are easier to express than the larger Fv fragments or even larger 

10 Ig molecules (which are four chain complexes). 

A ribozyme is a catalytic RNA molecule that cleaves other RNA molecules that 
contain nucleic acid sequences complementary to particular targeting sequences in the 
ribozyme. Two identical or different nucleic acid domains such as two ribozyme 
domains can be joined to create a bifunctional ribozyme that can act on more than one 

15 RNA substrate structure. General methods for constructing ribozymes, including 

hairpin ribozymes, hammerhead ribozymes and RNAse P ribozymes are known in the 
art. Castanotto et al. (1994) Advances in Pharmacology, 25: 289-317, reviews 
ribozymes (including group I, hammerhead, axhead ,hairpin and RNAse P). Ribozymes 
that can advantageously target desired specific sequences, such as HIV sequences, have 

20 been described (Ho, A. et al, WO 9426877 (1994); Yu et al. (1993) Proc. Natl. Acad. 

USA, 90:6340-6344, and Dropulic et al. (1992) J. Virol, 66:1432-1441). 

The hammerhead ribozyme and the hairpin ribozyme are catalytic molecules 
with antisense and endoribonucleotidase activity. Their intracellular expression can 
confer significant resistance to, for example, HIV infection. Hammer head ribozymes 

25 are described in Rossie et al. (1991) Pharmac. Ther., 50:245-254; Forster et al. (1987) 

Cell, 48:211-220; Uhlenbeck, OC (1987) Nature, 328:596-600; Haseloff, J. et al. (1988) 
Nature, 334:334:585; Dropulic et al, supra; and Castanotto et al, supra, and references 
cited therein. Hairpin ribozyme are disclosed in Hampel et al. (1990) Nucl Acids Res., 
18:299-304; Hampel et al, EP 0360257 (1990); Haseloff, J.P. et al, US 5,254,678 

30 (1993); Kraus, G. et al, US 5,958,768 (1999); Ho, A. et al, WO 9426877 (1994); 
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Ojwang et al. (1992) Proc. Natl. Acad. USA, 89: 10802-10806; Yamada et al. (1994) 
Gene Therapy 1: 39-45; Leavitt et al. (1995) Proc. Natl. Acad. USA, 92: 699-703; 
Leavitt et al, Human Gene Therapy, 5: 1 151-1 120; and Yamada et al. (1994) Virology, 
205: 121-126). 

5 For convenience, the conventional single letter nucleotide code to designate 

positions wherein more than one base may be present is provided in Table 1 . 



TABLE 1 





For RNA 


For DNA 




r 


g or a 


g or a 


(purine) 


y 


u or c 


t or c 


(pyrimidine) 


s 


g or c 


g or c 




w 


a or u 


a or t 




V 


a, g or c 


a, g or c 




X 


c , u , or a 


c, t, or a 




n 


= a ,g, c, or u 


a ,g, c, or t 





(Obviously, in an r:y pairing, if r=g then y=c, etc.) 



10 The typical substrate sequence for hairpin ribozymes is 

nnng/en*gucnnnnnnnn (where n*g is the cleavage site). The hammerhead 
ribozyme cleaves at any nux sequence. Thus, the same substrate target within the 
hairpin leader sequence, g U C, is targetable by the hammerhead ribozyme. 

Two DNA domains can be also linked to form a dual-domain DNA molecule. 
15 Certain DNA domains bind to proteins such as DNA polymerases, endonucleases, and 

transcription factors. Thus, two linked DNA domains can be linked to form a dual- 
domain DNA molecule that binds one or more DNA binding protein. 

Those skilled in the art will know of the existence of other nucleic acid or 
polypeptide domains that may be advantageously linked to form a dual-domain nucleic 
20 acid or polypeptide with one or more functions. Those of skill will also recognize the 

general desirability of methods that yield such products. 

The desired property of a dual-domain DNA, ribozyme or protein molecule can 
be optimized by modifying the nucleic acid that (1) constitutes the DNA domain, (2) 
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encodes the ribozyme sequence or (3) encodes the protein domain. This is achieved 
through a variety of conventional techniques. In one approach, the sequence or length 
of the linker region is varied in an effort to optimize the dual-domain molecule. The 
length and sequence of the linker region may indeed be critical to the function of a dual- 
5 domain protein. 

Methods for generating a scFv dual-domain protein with linkers of varying 
peptide length are known in the art (e.g., U.S. 5,837,242). Changes in sequence or 
length of the linker can adversely affect the stability, protease susceptibility, binding 
activity and expression levels of the scFv. Because, the effect of a change in linker 

10 sequence or length on the function(s) of the dual-domain polypeptide has been generally 

unpredictable, the effect on bioactivity of varying particular amino acid residues in the 
linker or changing its overall length generally cannot be determined a priori. 

There is thus a need for methods that permit creation of a nucleic acid library 
that encodes Di-L-D 2 (or higher order) structures wherein L has random length and 

15 sequence. The dual-domain protein can be expressed from the library and the properties 

of interest can be analyzed. Once a protein is identified as having "optimal" properties, 
its sequence can be determined by resolving the nucleotide sequence of the clone that 
encodes that protein. This approach obviates the necessity of creating and testing 
individual clones until finding one with the desired property. 

20 The polymerase chain reaction (PCR) has been used to generate libraries of 

nucleic acid products that have two domains connected by a linker having different 
sequences or different lengths. No currently available method permits simultaneous 
introduction of both random length and random sequence into the linker region of a 
population of nucleic acids. 

25 Expression Systems 

Many expression systems for heterologous proteins are known in the art. These 
include bacterial systems which have the advantages of rapid and abundant production, 
but are limited in many instances by their inability to produce properly folded and 
soluble proteins (unless the proteins are subjected to cycles of denaturation and 

30 renaturation). Baculovirus systems drive expression through the secretory pathways of 
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insect cells, thereby increasing the probability of improved protein solubility 
(Kretzschmar, T. et al. (1996) J. Immunol. Methods 195:93-101; Brocks, B. et al. 
(1997), Immunotechnology 3:173-184). Because manipulating the virus and growing 
insect cells can be time consuming and costly, the system is less suitable for expression 
5 of certain types of proteins, for example tumor-specific or individual-specific proteins 

such as idiotypic scFv polypeptides. There is therefore a need in the art for suitable 
rapid and economical expression systems to produce useful dual-domain proteins, one 
example of which is an idiotypic scFv vaccine for treating B-cell lymphoma. The 
present invention addresses this need. 

10 

SUMMARY OF THE INVENTION 

The present invention inventors have conceived of an approach for generating a 
library of dual-domain or multi-domain (>2) polypeptides from appropriate coding 
nucleic acids, which library is characterized by the members having random linkers 

1 5 linking each pair of polypeptide domains, wherein the random linkers have variable 

length and sequence. The nucleotide sequences encoding the linkers comprise a 
repeated pattern of degenerate triplet bases. The first and second (and/or higher order) 
domains may be the same or different from one another. The amino acid composition of 
an entire linker region may include between 1 and about 20 different amino acids with 

20 each repeated pattern of degenerate triplet bases encoding between 1 and about 12 

different amino acids. The preferred linker length ranges from 1 to 50 amino acids. In 
one embodiment, the polypeptide is a single chain immunoglobulin or single chain 
antibody (scFv) molecule wherein one domain is an immunoglobulin Vh domain and 
the other domain is an immunoglobulin Vl domain. 

25 More specifically, the present invention is directed to a library of dual-domain 

nucleic acid molecules each of which has (a) a first and a second domain; (b) separating 
and linking the domains, a linker which is a member of a randomized library of linkers 
that (i) vary in size and nucleotide sequence, (ii) consist of a repeated pattern of 
degenerate repeated triplet nucleotides. 
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In the above library, the repeated pattern of degenerate repeated triplet 
nucleotides of the linkers have the following properties: 

(i) position 1 of each repeated triplet cannot be the same nucleotide as position 2 of 
the repeated triplet; or 

5 (ii) position 2 of each repeated triplet cannot be the same nucleotide as position 3 of 

the repeated triplet; or 
(iii) position 1 of each repeated triplet cannot be the same nucleotide as position 3 of 
the repeated triplet. 

Preferably, the nucleotide in the first and second positions of each repeated 
10 triplet is selected from any two of deoxyadenosine, deoxyguanosine, deoxycytidine or 

deoxythymidine. In another embodiment, (i) position 1 of each repeated triplet is 
deoxyadenosine or deoxyguanosine; (ii) position 2 of each repeated triplet is 
deoxycytidine or deoxyguanosine; and (iii) position 3 of each repeated triplet is 
deoxythymidine. 

15 In another embodiment, two different repeated patterns of degenerate triplet 

bases are combined to generate a population of linkers used to produce dual-domain 
molecules. The combination of different repeated patterns of degenerate triplet bases is 
used to increase the complexity of the linker sequences obtained from the population. 
The different repeats can also be used to introduce differing structural or biochemical 

20 properties to the linker region. For example, degenerate triplet VWC and degenerate 

triplet nvt are used as the nontemplated sequence. In this example, the degenerate 
linker sequence is (vwc) x (nvt) y where x= 1 to 20 and y=l to 20. This combination 
would produce linkers containing different combinations of amino acids within each 
repeat as well as differing length of linkers. 

25 In one embodiment of the above library, at least one of the domains binds to a 

protein. In another embodiment, both of the domains bind to a protein. 

In yet another embodiment, at least one, preferably both, of the domains binds to 
a nucleic acid that is not a member of the library. 

In any of the above nucleic acid libraries, the first and the second domains are 

30 preferably coding sequences. 
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The library, as described above, is preferably produced in plants or plant cells. 
The present invention also provides a dual-domain or multi-domain nucleic acid 
molecule selected out from the library described above. 

Also provide is a library of dual-domain polypeptide molecules each of which is 
5 described by the formula D r L -D 2 (going from N-terminus to C-terminus) wherein 

(a) Di and D2 are polypeptide domains and 

(b) L is a peptide or polypeptide linker which is a member of a randomized 
library of linkers that vary in size and sequence, which library is encoded 
by nucleic acid sequences consisting of a repeated pattern of degenerate 

10 repeated triplet nucleotides. 

In a preferred embodiment, the present invention is directed to a library of multi- 
domain polypeptide molecules each of which comprises polypeptide domains D, each 
pair of D's being linked by a peptide or polypeptide* linker L, such that each molecule is 
described by the formula 
15 D x L y 

wherein x is an integer between 2 and about n, wherein n is preferably about 20, y is an 
integer between 1 and (n-1), with the proviso that for any value of x, y is preferably x-1; 
Di is bonded to a single C-terminal linker; D n (the "ultimate" C-terminal domain) is 
bonded to a single N-terminal linker; each of D 2 to D n -i are bonded to a N-terminal and 
20 a C-terminal linker; each L is a member of a randomized library of linkers that vary in 

size and sequence, which linker library is encoded by nucleic acid sequences consisting 
of a repeated pattern of degenerate repeated triplet nucleotides. 

A preferred library is a library of dual-domain polypeptide molecules each of 
which is described by the formula Di-L -D 2 wherein 
25 (a) Di and D2 are polypeptide domains and 

(b) L is a peptide or polypeptide linker which is a member of a randomized 
library of linkers that vary in size and sequence, which library is encoded 
by nucleic acid sequences consisting of a repeated pattern of degenerate 
repeated triplet nucleotides. 
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In the above libraries of dual- or multi-domain polypeptide molecules, each 
linker in the library preferably (i) has a length of between about 1 and 50 amino acid 
residues and (ii) consists of between 1 and about 20 different amino acids and (iii) each 
repeated pattern of degenerate triplet bases encodes between 1 and 12 different amino 
acids. 

In the library of dual domain or multi-domain polypeptide molecules above, the 
repeated pattern of degenerate repeated triplet nucleotides encoding the linkers 
preferably has the following properties: 

(i) position 1 of each repeated triplet cannot be the same nucleotide as position 2 of 
the repeated triplet; or 

(ii) position 2 of each repeated triplet cannot be the same nucleotide as position 3 of 
the repeated triplet; or 

(iii) position 1 of each repeated triplet cannot be the same nucleotide as position 3 of 
the repeated triplet. 

Preferably, the nucleotide in the first and second positions of each repeated triplet is 
selected from any two of deoxyadenosine, deoxyguanosine, deoxycytidine or 
deoxythymidine. In one embodiment thereof (i) position 1 of each repeated triplet is 
deoxyadenosine or deoxyguanosine; (ii) position 2 of each repeated triplet is 
deoxycytidine or deoxyguanosine; and (iii) position 3 of each repeated triplet is 
deoxythymidine. 

The above library of dual- or multi-domain polypeptides is preferably produced 
in plant cells. 

Specific embodiments of this invention include any dual-domain (or multi- 
domain) polypeptide molecule selected from the library as described above. One 
embodiment provides a three domain peptide selected from the above library which is a 
dual domain scFv polypeptide linked to a third polypeptide domain, third domain is 
preferably a toxin polypeptide with therapeutic utility or an enzyme with diagnostic 
utility or use as a research tool. The foregoing polypeptides are preferably produced in 
plant cells. 
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This invention is further directed to a method for generating the library of dual- 
domain nucleic acids as above, comprising: 

a. obtaining two template DNA sequences that comprises the first and the second 
domains; 

5 b. preparing amplification primer pairs which amplify the first and second domains 

where each primer pair comprises an upstream primer and a downstream primer, 
each primer having a 5' end and a 3' end, wherein the downstream primer for 
the first domain or the upstream primer for the second domain comprises a 
nontemplated sequence, 

10 the nontemplated sequence comprising a repeated pattern of degenerate 

repeated triplet nucleotides, wherein at least two of the 5' terminal 
triplets of the repeated pattern of degenerate repeated triplet nucleotides 
have the same degenerate sequence; 

c. amplifying the domains with the amplification primers to generate at least one 
15 population of nucleic acid domains having different lengths and sequences in the 

non-templated sequence; and 

d. ligating the nucleic acid domains generated in step (c) to generate the a 
population of dual-domain molecules. 

In the above method, the repeated pattern of degenerate repeated triplet nucleotides in at 
20 least one of the primers preferably has the following properties: 

(i) position 1 of each repeated triplet cannot be the same nucleotide as position 2 of 
the repeated triplet; or 

(ii) position 2 of each repeated triplet cannot be the same nucleotide as position 3 of 
the repeated triplet; or 

25 (iii) position 1 of each repeated triplet cannot be the same nucleotide as position 3 of 

the repeated triplet. 

In one embodiment of the above libraries of dual- or multi-domain polypeptide 
molecules, a linker in the library that consists of 10 or more residues in length should 
contain at least three different residues and a linker in the library that consists of 20 or 
30 more residues in length should contain at least four different residues. 
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In the above method, at least one of the primers preferably contains a non- 
templated endonuclease recognition site. 

In the foregoing methods, the template DNA sequences are preferably made by 
reverse transcription of mRNA. 
5 The method may further comprise the step of ligating the population of dual- 

domain nucleic acids to vectors, and, further comprise the step of introducing the vector 
into a host. In these methods, the nucleic acid domains generally will encode 
polypeptide domains, and the method preferably also comprises the step of expressing 
dual-domain polypeptides encoded by the dual-domain nucleic acids. In an additional 
1 0 step, the method may comprise the step of transcribing RNA from the vectors. 

For plant expression, the vectors should be compatible with replication and/or 
expression of the nucleic acids in plant cells. The method preferably includes the steps 
of introducing the transcribed the RNA into a plant cell and expressing the dual-domain 
(or multi-domain) polypeptide. 
1 5 This invention also provides a population of dual-domain polypeptides or a dual- 

domain polypeptide selected from that population, produced by the method described 
above. Preferably the population or selected polypeptide is produced in plant cells. 

Also provided is a method of producing a dual domain (or, with appropriate 
modifications, a multi-domain) polypeptide, comprising the steps of: 
20 (a) joining a nucleic acid encoding the first domain of the polypeptide to a nucleic 

acid encoding a first part of a linker to produce a first nucleic acid construct; 
(b) joining the nucleic acid encoding a second part of the linker to a nucleic acid 

encoding the second domain of the polypeptide to produce a second nucleic acid 

construct; 

25 (c) incorporated the first and the second constructs into a transient plant expression 

vector in frame so that, when expressed, the polypeptide bears the first and 
second domain separated by the linker as described by the formula Di-L -D2. 
(d) transfecting a plant (or plant cell) with the vector so that the plant transiently 
produces the polypeptide; and 

30 (e) recovering the polypeptide as a soluble, functionally-folded protein. 
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General References 

Unless otherwise indicated, the practice of many aspects of the present invention 
employs conventional techniques of molecular biology, recombinant DNA technology 
and immunology, which are within the skill of the art. Such techniques are described in 
5 more detail in the scientific literature, for example, Sambrook, J. et al, Molecular 

Cloning: A Laboratory Manual, 2 nd Ed., Cold Spring Harbor Press, Cold Spring 
Harbor, NY, 1989, Ausubel, F.M. et al. Current Protocols in Molecular Biology, Wiley- 
Interscience, New York, current volume; Albers, B. et al., Molecular Biology of the 
Cell, 2 nd Ed., Garland Publishing, Inc., New York, NY (1989); Lewin, BM, Genes IV, 

10 Oxford University Press, Oxford (1990); Watson, J.D. et al, Recombinant DNA, Second 

Edition, Scientific American Books, New York, 1992; Darnell, JOE et al. , Molecular 
Cell Biology, Scientific American Books, Inc., New York, NY (1986); Old, R.W. et al, 
Principles of Gene Manipulation: An Introduction to Genetic Engineering, 2 nd Ed., 
University of California Press, Berkeley, CA (1981); DNA Cloning: A Practical 

15 Approach, vol. I & II (D. Glover, ed.); Oligonucleotide Synthesis (N. Gait, ed., Current 

Edition); Nucleic Acid Hybridization (B. Hames & S. Higgins, eds., Current Edition); 
Transcription and Translation (B. Hames & S. Higgins, eds., Current Edition); 
Methods in Enzymology: Guide to Molecular Cloning Techniques (Berger and Kimball, 
eds., 1987); Hartlow, E. et al, Antibodies: A Laboratory Manual, Cold Spring Harbor 

20 Laboratory Press, Cold Spring Harbor, NY, 1988) , Collegian, J.E. et al, eds., Current 

Protocols in Immunology, Wiley- Interscience, New York 1991 . Protein structure and 
function is discussed in Schulz, GE et al, Principles of Protein Structure, Springer- 
Verlag, New York, 1978, and Creighton, TE, Proteins: Structure and Molecular 
Properties, W.H. Freeman & Co., San Francisco, 1983. 

25 DEFINITIONS 

As used herein, the following terms have the meanings ascribed to them unless 
specified otherwise. 

A polypeptide or protein "domain" generally refers to a region of a polypeptide 
chain that is folded in such a way that confers a particular structure and/or biochemical 
30 function. (Schulz et al, supra). Domains can be defined in structural or functional 
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terms. A functional domain can be a single structural domain, but may also include 
more than one structural domain. Such functions can include enzymatic catalytic 
activity, ligand binding, chelating of an atom or endogenous fluorescence. As discussed 
above, and of particular importance to this invention, V H and V L regions of Ig molecules 
5 each form single structural domains, which act in concert in forming an antigen- 

combining site. A domain's function is dictated to a large extent by the distinct shapes 
into which it folds. Although most commonly used to describe proteins, a "domain" can 
also describe a region of a nucleic acid, either the coding sequence of a polypeptide 
domain, or a nucleic acid structure that carries out a particular function (e.g., a 

1 0 ribozyme's catalytic activity or protein binding). Binding domains, defined by binding 

to a binding partner (receptor or ligand) are exemplified by the V H and V L regions of Ig 
molecules (see below), each of which forms a single structural domain that act in 
concert in forming an antigen-combining site. Other well-known binding domains are 
extracellular domains of cell surface receptors that bind a respective ligand, for 

1 5 example, a peptide hormone. Moreover, the portions of a polypeptide or peptide ligand 

such as erythropoietin, GM-CSF or enkephalin, that binds to its respective receptor is 
considered a functional (binding) domain. Parts of proteins that are responsible for the 
capacity to fluoresce (e.g., green fluorescent protein - GFP) are also considered 
functional domains. 

20 A binding domain of a DNA or RNA molecule is a part of the molecule that 

binds a protein (preferably) such as a transcription factor (e.g., cAMP Response Element 
Binding Protein (CREB)), a restriction enzyme (e.g., EcoR I) or a DNA polymerases 
(e.g., Taq DNA Polymerase). 

The present invention is directed in part to methods for creating dual-domain 

25 molecules. In preferred dual-domain molecules, the linker regions between the two 

domains is varied whereas the sequence of the linked domains is held constant. 

"Template DNA" refers to the DNA that is amplified by "amplification primer 
pairs" (the population of oligonucleotide primers used in the amplification reaction). 
This DNA may be produced by biological (recombinant) or synthetic (chemical) means. 
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Further, mRNA may be reverse transcribed to form the template DNA that is used in the 
amplification reaction. 

An "upstream primer" is an oligonucleotide primer, or a mixture of 
oligonucleotide primers, that anneal(s) to the antisense strand of the template DNA. 

A "downstream primer" is an oligonucleotide primer, or a mixture of 
oligonucleotide primers, that anneal(s) to the sense strand of the template DNA. 

A "nontemplated sequence" is the portion of an amplification primer that 
contains a repeated nucleotide triplet. As the goal of this sequence is to introduce 
variability into the linker library, it is not complementary to the DNA sequence being 
amplified, e.g.,, the polypeptide domain-coding regions. 

The phrase "repeated pattern of degenerate triplet bases" refers to a nucleic 
acid sequence wherein a set of three bases (a triplet) is repeated in the nontemplated 
sequence, creating a repeating motif where the individual bases in the repeating triplet 
are independently selected from a defined array. For example, where the repeated triplet 
is nws (see Table 1), n can be any of a, c, g, or t; W can be a or t, and S can be 
g or c, rendering the repeated pattern degenerate. Herein, these repeated triplets are 
adjacent to each other. The nontemplated sequence of the amplification primer that 
contains these "repeated pattern of degenerate triplet bases" is produced in vitro. 

"Amplifying/amplification" refers to a reaction wherein the entire template 
DNA, or portions thereof, are duplicated at least once, preferably many times. 

"Ligating/ligation" refers to covalent coupling of two or more DNA strands (3' 
end to 5 ' end) using enzymatic and/or chemical methods. 

A "nontemplated endonuclease recognition site" is a sequence within the 
nontemplated sequence that is recognized by a restriction endonuclease. 

One use of the term "library" herein refers to a population, set or collection of 
nucleic acid molecules consisting of domains joined by linker sequences, which linkers 
vary in size and nucleotide sequence and which are produced using the methods 
described. The number of library members contained in the library which differ in 
nucleotide sequence is determined by the number of sequences contained in the repeated 
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pattern of degenerate triplet bases. The term "library" is also applied to the population 
of polypeptides encoded by the nucleic acid library. 

As used herein, a "linker" at the nucleic acid level is a nucleic acid molecule or 
sequence that joins two nucleic acid domains or two nucleic acid sequences encoding 
5 two polypeptide domains. The linker sequence has a pattern of degenerate repeated 

triplet nucleotides with the following properties: 

(i) position 1 of each repeated triplet cannot have the same nucleotide as at position 
2 of the repeated triplet; or 

(ii) position 2 of each repeated triplet cannot be the same nucleotide as position 3 of 
10 the repeated triplet; or 

(iii) position 1 of each repeated triplet cannot be the same nucleotide as position 3 of 
the repeated triplet. 

At the protein level, the linker is the peptide expression product of the linker nucleic 
acid sequence. In a preferred embodiment, the present linker excludes such sequences 

1 5 that encode (or are) Gly4Ser or repeats thereof. 

As used herein, a "library of linkers" (or "linker library") at the nucleic acid 
level is a set or collection or population of nucleic acid molecules or sequences each of 
which joins two nucleic acid domains or two nucleic acid sequences encoding two 
polypeptide domains, each library member of which has a pattern of degenerate repeated 

20 triplet nucleotides with the following properties: 

(i) position 1 of each repeated triplet cannot have the same nucleotide as at position 
2 of the repeated triplet; or 

(ii) position 2 of each repeated triplet cannot be the same nucleotide as position 3 of 
the repeated triplet; or 

25 (iii) position 1 of each repeated triplet cannot be the same nucleotide as position 3 of 

the repeated triplet. 

At the protein level, the linker library is the set of expression products of the population 
of linker nucleic acid members of the library. 

A "single-chain antibody" (scFv; also termed "scAb" by others) is a single 
30 chain polypeptide molecule wherein an Ig heavy chain variable (Vh) domain and an Ig 
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light chain variable (V L ) domain are artificially linked by a relatively short peptide 
linker that allows the scFv to assume a conformation which retains binding capacity and 
specificity and for the antigen (or epitope) against which the original antibody (from 
which the V H and Vl domains are derived) was specific. 

BRIEF DESCRIPTION OF THE DRAWINGS 

FigureVshows a Western blot analysis of scFv proteins generated in Example 1 
in plant protoplasts. C&4s the scFv with the (Gly4Ser)3 linker. The number of the lane 
refers to the # of the clone? v T^e size in kilodaltons (kD) is shown on the left. 

Figure 2 shows a Westernblot analysis of scFv proteins generated in Example 2 
in whole plants. CJ is the scFv with the (61v4Ser) 3 linker. The number of the lane 
refers to the # of the clone. The size in kDa is shown on the left. 

Figure 3 shows Coomassie stained SDS-PAGE analysis of scFv proteins 
generated in Example 3 in whole plants. The number of the lane refers to the # of the 
clone and the arrow indicates the scFv protein. The size in kDa is shown on the left. 

DESCRIPTION OF THE PREFERRED EMBODIMENTS 

The present invention employs expression systems, preferably plant-based, to 
produce dual-domain proteins, for example, individualized tumor-specific immunogens 
for treating B cell lymphoma. The plant-based transient heterologous expression system 
described herein produces correctly folded polypeptides in surprisingly high abundance 
and with surprisingly potent immunogenicity. This system allows rapid and economical 
production of useful quantities of such proteins or polypeptides. 

The nucleic acid encoding the dual-domain product is introduced into plants 
using an appropriate plant virus vector, described in detail below, leading to expression 
and rapid production of appropriately folded dual-domain protein in plant cells, plant 
parts and whole plants. 

The selection of (1) appropriate linkers and (2) the transient expression system, 
as described herein, ensure that useful dual-domain polypeptide molecules are secreted 
by the plant cells in a form that is folded in solution in a conformation that permits their 
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use for their intended purpose, e.g., as tumor-specific immunogens. An scFv produced 
according to this invention is advantageously obtained as the predominant secreted 
protein species in those plant cells into which it has been successfully incorporated. 
This permits simple selection and straightforward, rapid purification for the uses 
5 described herein, including as a vaccine composition. 

While plant expression systems are preferred for reasons enumerated herein, the 
invention is not intended to be limited to any particular system. The present approaches 
for generation of random linker libraries of varying degrees of complexity in the 
production of dual domain (or multi-domain) nucleic acids and proteins can be applied 
10 to other prokaryotic and eukaryotic hosts, for example bacteria, yeast cells or 

mammalian cells. 

In addition to the scFv vaccines comprising Ig V domains that are described 
below, the present invention can be applied directly to other protein antigens which can 
expressed in plants in a similar manner to achieve proper folding and enhanced 

15 immunogenicity. Examples include antigens that are common to a particular type of 

tumor or family of tumors, such as carcinoembryonic antigen (CEA), prostate-specific 
antigen (PSA) present in prostate adenocarcinomas, tyrosinase present in melanomas, 
and many other known and yet undiscovered tumor antigens. Another type of clonally- 
distributed (self) antigen is a T cell receptor (TCR) domain that includes a portion of the 

20 a, (5, y or 8 chain V region (or a combination thereof). Such TCR-based antigens can 

be markers and therefore, targets in certain T cell leukemias and lymphomas as well as 
in autoimmune diseases. Thus, autoimmune diseases associated with identifiable T cell 
clones or with usage of a particular TCR chain V region are modulated/treated by 
immunizing with a polypeptide antigen corresponding to TCR V region polypeptides 

25 that is made by the approach described herein. 

Other dual domain proteins within the scope of the invention include a viral coat 
protein domain combined with another domain of interest. If necessary, this molecule is 
purified taking advantage of the coat protein's characteristics. 

The protein domains are not limited to those expressed on the cell surface; dual 

30 domain proteins wherein one or both polypeptides are derived from a cytosolic protein 
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or a protein that functions in soluble form are also intended. Examples include 
cytokines such as IL-1(3 and polypeptide hormones. 

Other preferred polypeptide domains that are linked as dual- or multi-domain 
proteins using the linker approach of the present invention are transcription factors. 
5 These can be assembled so that active domains of different transcription factors that act 

in concert or sequentially are combined as single chain molecules separated by linkers. 
The linker size and complexity is chosen on the basis of the functional requirements for 
the transcription factors, e.g., the distance between the nucleic acid binding sites for 
these factors if they must bind and act at about the same time. Such dual domain or 

10 multi-domain polypeptides would be expected to show advantageous properties in 

promoting, activating or orchestrating transcriptional events. This may be particularly 
useful in cases where more than one factor must act and one is limiting in its 
concentration or availability. This limitation is overcome by creating an artificial dual 
domain or multi-domain transcription factor where the domain of the otherwise limiting 

1 5 factor is always linked to a domain or domains or one or more nonlimiting transcription 

factors. 

Alternatively, a transcription factor domain may be linked using the present 
approach to an inhibitory moiety such as a toxin so that binding of the transcription 
factor domain to its target DNA permits the toxin to perform its function and inhibit 

20 transcription or otherwise block a cellular function. Use of the stimulatory or inhibitory 

transcription factor constructs with linkers having the appropriate flexibility could 
permit the attainment of new levels of control over cellular functions not heretofore 
possible using mixtures of proteins or by protein domains that have been linked by a 
limited array of preselected individual linkers. The random linker library approach 

25 generates a much larger array of choices that can be selected by appropriate means as 

described herein. 

The dual- or multi-domain polypeptides prepared in accordance with this 
invention using the random linker library approach can be delivered to a target cell 
exogenously, or can be combined in an expression system that is inserted into the target 
30 cell and functions autonomously or under the control of cellular factors. This can be 
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accomplished using routine method of molecular biology using conventional vectors 
such as viral vectors that deliver the nucleic acid encoding the polypeptides to the 
appropriate cells by selective or nonselective means. 

The product of the present invention may be used in the form of a dual (or multi) 
5 domain nucleic acid molecule, for example, a bifunctional DNA vaccine that is intended 

for administration to a subject and, when expressed, produces an immunogenic dual 
domain protein in the subject. 

Unless otherwise indicated, the practice of the present invention employs 
conventional techniques of molecular biology, recombinant DNA technology and 
10 immunology, which are within the skill of the art. Such techniques are described in 

more detail in the references listed earlier. 

Focusing on a linker region L between two polypeptide domains, it may be 
difficult to predict what amino acid substitutions or additions will optimize a particular 
property of the linker, and therefore, of the multi-domain polypeptide as tested, for 
15 example, in a biochemical or biological assay. The length and the sequence of L can 

affect the activity of the polypeptide product because of an impact on properties such as 
solubility, folding and conformation, protease susceptibility or expression level. 

The present invention provides approaches for creating a nucleic acid library, 
that when expressed, results in a library of polypeptides with linker regions that areO 
20 variable in both length and sequence. This invention permits a practitioner to create and 

analyze such libraries, thereby providing advantages over the prior art where either 
length or sequence, but not both, could be varied. 

The present invention is based on the use of known template nucleic acids that 
encode the protein domains of interest. The nucleic acid encoding a first domain is 
25 amplified in a PCR reaction using an upstream primer that is complementary to the 

anti sense strand of the template and a downstream primer that is complementary to the 
sense strand of the template DNA and that may contain repeated triplets of nucleotides 
at its 5' end. 

Then the nucleic acid for the second domain is amplified in a PCR reaction with 
30 an upstream primer that is complementary to the antisense strand of the template DNA 
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and that may have a repeated nucleotide triplet sequence at its 5' end and with a 
downstream primer that is complementary to the sense strand of the template DNA. 

To get the desired variability in length and sequence, either the downstream 
primer for the first domain and/or the upstream primer for the second domain must 
5 contain the repeated triplet of nucleotides. The resulting two PCR products are then 

combined to form a nucleic acid that encodes a dual-domain protein, or contains the 
dual DNA or dual RNA domains that are linked by the linker region. This resultant 
molecules (protein, DNA or RNA) can then be analyzed by a variety of means known to 
those of skill in the art. 

10 The structures of proteins and nucleic acids and their domains are determined by 

well-known biochemical and biophysical methods, in particular X-ray crystallography 
and two-dimensional nuclear magnetic resonance (2D-NMR) spectroscopy. Inspection 
of a 3D structure may be sufficient to delineated a macromolecule's domains. For 
example, the 3D structure of the dimeric enzyme glutathione reductase illustrates that 

15 each subunit is composed of three structural domains - a FAD binding domain, a NADP 

binding domain and a third domain that forms the interface between the dimers. See 
Schulz et ai, supra. The Ig V H and V L domains cooperate to form the antibody's 
antigen binding pocket. Thus these structural domains fold into distinct shapes that are 
important for the molecule's function. 

20 CLONING OF DOMAINS 

A domain may be isolated by any of a number of techniques. In general, a 
nucleic acid sequence encoding a polypeptide (or RNA) domain of interest is cloned 
from an appropriate cDNA library or a genomic DNA library based on hybridization 
with a oligonucleotide probe that represents the domain. 

25 For the present invention, preferred nucleic acids and proteins are mammalian, 

more preferably human sequences. 

Alternatively, the DNA is isolated by amplification techniques using 
oligonucleotide primers starting with a DNA or RNA template. (See, e.g., Dieffenfach 
et al, PCR Primer: A Laboratory Manual (1995)). These primers can be used to 

30 amplify either a full length coding sequence or a partial sequence that could constitute a 
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probe (ranging in length up to about several thousand nucleotides). The resultant probe 
sequence is then used to screen a mammalian library for the full-length nucleic acid of 
interest. Use of synthetic oligonucleotide primers and amplification of an RNA or DNA 
template is described in U.S. Patents 4,683,195 and 4,683,202; PCR Protocols: A Guide 
5 to Methods and Applications (Innis et al, eds, 1990)). Methods such as PCR and ligase 

chain reaction (LCR) can be used to amplify nucleic acid sequences of domains directly 
from mRNA, from cDNA, or from genomic or cDNA libraries. Degenerate 
oligonucleotides can be designed to amplify domain homologues using the known 
sequences that encode the domain. Restriction endonuclease sites can be incorporated 

10 into the primers. Genes amplified by the PCR reaction can be purified on agarose gels 
and cloned into an appropriate vector. 

In expression cloning, nucleic acids are isolated from expression libraries using 
as a probe an antibody (or other binding partner) specific for an epitope of the expressed 
polypeptide. Polyclonal or monoclonal antibodies (mAbs) can be raised by 

1 5 immunization with one or more peptide fragments of the domain being cloned. 

Nucleic acid probes, preferably oligonucleotides are used under preferably 
stringent hybridization conditions to screen libraries in order to isolate polymorphic 
variants or alleles of the genes that encode the polypeptide domain of interest. 
Alternatively, antibody-based expression cloning permits cloning of polymorphic or 

20 allelic variants or interspecies homologues. 

Selection of sources for the cDNA library and its production from mRNA is 
done using conventional methods (Gubler et al, Gene 25:263-269 (1983); Sambrook et 
ai, Molecular Cloning, A Laboratory Manual (2 nd ed. 1989); Current Protocols in 
Molecular Biology (Ausubel et al, eds., 1994 or latest edition). 

25 Methods for preparing genomic DNA libraries are conventional in the art. For 

example, DNA extracted from a tissue may be mechanically sheared or enzymatically 
digested to yield fragments of about 12-20 kb that are separated by gradient 
centrifugation and inserted into appropriate expression vectors. These vectors are 
packaged into phage in vitro. Recombinant phage are analyzed by plaque hybridization 

30 (Benton et al, Science 196:180-182 (1977). Colony hybridization is carried out, for 
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example, as generally described by Grunstein et ai, Proc. Natl. Acad. Sci. USA., 
72:3961-3965 (1975). 

Synthetic oligonucleotides can be used to construct recombinant "genes" for use 
as probes or for expression of the domain polypeptides. 
5 Oligonucleotides can be chemically synthesized using solid phase 

phosphoramidite triester methods (Beaucage et ah, Tetrahedron Letts. 22:1859-1862 
(1981)) using an automated synthesizer (Van Devanter et ai, Nucleic Acids Res. 
12:6159-6168 (1984)). Purification of oligonucleotides is typically by native 
acrylamide gel electrophoresis or by anion-exchange HPLC (Pearson et al.,J. Chrom. 

10 255:137-149(1983)). 

Sequences of cloned genes and synthetic oligonucleotides can be verified by 
conventional methods such as the chain termination method (Wallace et al., Gene 
16:21-26 (1981) using a series of overlapping oligonucleotides usually 40-120 bp in 
length, representing both the sense and antisense strands of the gene. 

15 The nucleic acid encoding the desired polypeptide is typically cloned into an 

intermediate vector before transformation or transfection of prokaryotic or eukaryotic 
cells for replication and/or expression of the nucleic acid. These intermediate vectors, 
e.g., plasmids or shuttle vectors, are typically for use in prokaryotic cells. 

LINKER REGION 

20 Functions of the linker L are to join a first and a second polypeptide (or nucleic 

acid) domain as a single macromolecule, permit the two domains to fold correctly and 
thereby assemble into a functional molecule. In the scFv embodiment where the amino 
acid linker L links the V H and V L domains, L may vary in length between 1 and about 
50 residues. An individual L preferably is composed of between 1 and about 20 

25 different amino acids, and each repeated pattern of degenerate triplet bases encodes 

between 1 and about 12 different amino acids. An optimal linker contributes 
significantly to the correct folding of the V H and V L domains so that the resulting scFv 
(a) is soluble and (b) binds antigen or (c) is able to act as an antigen to elicit a relevant 
immune response. 
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In one embodiment the linker will be resistant to cleavage by proteases that the 
final product is expected to encounter when being used. 

In contrast, the linker may also be designed to incorporate an amino acid or short 
sequence that serves as a cleavable site for a protease that can be used to separate the 
5 one or several domains from one another at an appropriate time. 

Additionally, the linker may be designed to confer affinity to another molecule 
or matrix facilitating subsequence purification of the expressed of the fused domains 
based on the properties of the linker. One example includes incorporation of a histidine 
(His) tag that permits purification on a metal (e.g., nickel) affinity column. Other 
10 affinity tags are well-known in the art and need not be described here. 

Depending on the two domains being linked, the sequence and length of L can 
vary widely. 

Linkers may be selected based on their ability to fuse two polypeptide domains 
and at the same time, facilitate purification and characterization based on the properties 

15 of one (or both) domains. Examples include fusions of a selected protein domain and 

glutathione S-transferase (GST), which can then be purified on an affinity matrix of 
glutathione-agarose (Smith et al. (1988) Gene, 67:31-40). The linker used by Smith et 
al. was later modified by Guan et al. (Anal. Biochem. 7P2:262-267(1991)) to introduce a 
glycine rich stretch known as a "glycine kinker" having the amino acid sequence 

20 PGISGGGGG [SEQ ID NO: 1 ] . Such a linker, within the scope of this invention, 

facilitates the cleavage of GST from its fusion partner (in that example, a protein 
tyrosine phosphatase). 

Vectors for producing these kinds of fusion proteins are well-known in the art, 
and many are commercially available. For example, New England Biolabs provides 

25 pMAL-p2, a vector that encodes a maltose binding protein that can be fused to a domain 

sequence that is cloned into the vector. In pMAL-p2, the amino acid sequence of the 
linker between the maltose-binding protein and the added domain is 
NNNNNNNNNNLGIEGR [SEQ ID NO:2]. The stretch of asparagines facilitates 
purification of the fusion protein on an amylose affinity column. 
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A^inker that has been used to link Ig V H and V L domains into an scFv is the 1 5 
amino acid sequence GGGGSGGGGSGGGGS (SEQ ID NO:3), commonly designated 
(Gly4-Ser)3. A numberof other linkers for scFv production have been described in 
Lawrence et al, FEBS Letters, 425: 479-484 (1998), Solar et al, Protein Engineering, 
8:717-723 (1995), Alfthan et at^rotein Engineering, 8: 725-731 (1995), Newton et al, 
Biochemistry, 35:545-553 (1996), Agbr*' al., Human Gene Therapy, 7:2157-2164 
(1996) and Koo et al., Applied and Environmental Microbiology, 64:2490-2496 (1998). 
The library approach of this invention will generate many useful linkers beyond those 
noted above. 

Creation of Variable Length and Sequence in the Linker Region 

A preferred approach is to create a library of two domain polypeptides (D1-L-D2) 
wherein each library member varies from all others in L. In other words, randomness 
between the domains is found in the linkers that link them. This permits the generation 
of an array of D1-L-D2 products, particularly in a plant expression system, from which 
one can select one, or an array, of optimally folded, optimally functioning products. 

In this approach, two cloned domains are amplified and a linker of variable 
length and variable sequence is introduced between them using an amplification method 
such as PCR. To achieve this, a portion of the 3' end of the downstream primer for the 
upstream domain and the 3' end of the upstream primer for the downstream domain are 
complementary to the respective domain sequence being amplified. ("Downstream" and 
"upstream" are relative to the linker). However, a portion of the 5' end of the 
downstream primer for the upstream domain and/or the 5' end of the upstream primer 
for the downstream domain are not complementary to the respective domain being 
amplified. This noncomplementary segment of the primers, termed a "nontemplated 
sequence," contains a repeated pattern of degenerate triplet bases which, at the nucleic 
acid level, join the upstream to the downstream domain. 

The upstream and downstream primers for amplifying Di and D2 are mixed with 
a DNA polymerase and other necessary reactants for amplification. See Innis et al., 
supra, for details. The reaction mixture is subjected to multiple temperature cycles to 
melt DNA duplexes, allow annealing of primers to template and polymerization of the 
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PCR product. During the first cycle the DNA polymerase carries out "first strand" 
synthesis until the temperature is raised sufficiently to melt the duplexes. Thereafter, 
when the temperature is lowered to the annealing temperature, the primers will anneal to 
the first strand DNA. The DNA polymerase will then make a "second strand" as the 
5 polymerization temperature of the cycle is reached. This results in exponential 

accumulation of the domain being amplified. Because of the nontemplated sequences, 
the amplified domain-encoding DNA will form a population (library) of molecules with 
a repeated pattern of degenerate bases at the 3' end of the upstream product and the 5' 
end of the downstream product. 

10 Due to the nature of the repeated pattern of degenerate triplet bases in the 

nontemplated sequences of the amplification pairs, the PCR products are diverse in 
sequence and length in the L region. The length diversity is mostly likely due to duplex 
formation of the L region of the primers with bubbles or loops in the middle due to base 
pair mismatching. The 3 '-5' exonuclease and the 5 '-3' polymerase activities serve to 

1 5 delete or extend the length of the primer sequence. 

To shorten the L sequence, a primer containing the repeated triplet is annealed to 
a complementary strand that has already incorporated the L sequence. The degenerate 
primer can then anneal to form a duplex with a bubble at the site of unpaired bases, and 
leave an unpaired 3' extension (overhang), as diagrammed below (underscored). 

20 Duplex with bubble and 3 ' overhang 

RST- RST-RST- RST-RST 

/ \ 
5' RST CAT-GCC 3' 

III Ml Ml 

25 3 ' YSA- YSA-YSA-YSA-YSA-YSA- GTA-CGG 5' 

(upper sequence is SEQ ID NO:50; lower sequence is SEQ ID NO:51) 

An enzyme such as PFU or Vent that has 3 '-5' exonuclease activity will degrade 
the 3' extension in the 5' direction of the complementary strand until it reaches the 
30 annealed portion of the duplex. In this manner one or more triplet repeats can be 

removed from the PCR product, thereby shortening the peptide linker L by one (or 
more) amino acids. 
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For extension of the linker L , the "top" strand can anneal to the complementary 
strand so that a duplex with a 5' extension is formed, as follows: 

Duplex with bubble and 5' overhang. 

5' RST-RST-RST-RST-RST-RST- CAT-GCC 3' 

III Ml Ml 

3' YSA GTA-CGG 5' 

\ / 
YSA- YSA- YSA- YSA- YSA 

(upper sequence is SEQ ID NO:50; lower sequence is SEQ ID NO:51) 

The polymerase present in the amplification reaction, e.g., Taq polymerase, can 
extend the PCR product by one or more triplet repeat codons. Because of its 5 '-3' 
polymerase activity, the enzyme can fill in the 5' extension, thereby lengthening the 
linker region by one or more repeated triplets. This will extend of the peptide linker by 
one or more amino acids. If the polymerase in the PCR lacks 3'-5' exonuclease activity, 
and if no enzyme with 3 '-5' exonuclease activity is present, then only extensions of 
triplet nucleotides should occur. 

To promote bubble formation, the 5' end of at least one primer must contain the 
same degenerate bases in at least two terminal codons to prevent slippage. That is, there 
must be two triplet repeats with the same sequence (e.g., 5'rst-rst3',or 
5 ' ysa-ysa3 ' , etc.) at the 5' end of at least one of the primers used to amplify a 
domain. 

To retain the proper reading frame, which is important if the fused nucleic acid is 
to express a protein (as is the case with an scFv), several rules should be observed in 
designing the degeneracy of the nontemplated region of the primers that will be the L 
region. The degenerate triplet repeats should obey one of the following rules: 

(a) position 1 of the triplet cannot contain the same base as position 2; or 

(b) position 2 of the triplet cannot contain the same base as position 3; or 

(c) position 1 of the triplet cannot contain the same base as position 3. 
For example, a repeated triplet rst and ysa will obey these rules. The following 
combinations of bases fulfill those rules: rst = agt, act, ggt, get and ysa = tea, 
tga, cca, cga. Other degenerate sequences can also fulfill these rules. For example 
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str (which can be gta, gtg, eta, or ctg) or ayr (which can be aca, acg, ata or 
atg) could serve as a repeated triplet. 

Another degenerate triplet sequence useful in this invention is nvt which can be 
any of 1 2 different codons encoding 1 1 different amino acids. The degenerate triplet 
5 nws can be any of 16 different codons encoding 12 different amino acids. The 

degenerate triplet csy does not adhere to these rules because it could be CCC (which 
does not comply). Similarly, any other degenerate sequence that can be a triplet of 
identical bases (i.e., CCC, aaa, ggg, or ttt) would not obey these rules and would 
thus be excluded as a repeated triplet. 

10 Restriction enzyme recognition sequences can be incorporated into the primers 

to facilitate cloning and orientation of, for example the IgV region domains (or any 
other polypeptide domains) with respect to each other. For example, a restriction 
endonuclease site may be incorporated in the 5' end of the upstream amplification 
primer for the Dj domain, which will facilitate ligation of the 5' end of the upstream 

1 5 domain to the 5' end of a restricted vector into which that fragment is being subcloned. 

Likewise the same or a different restriction site can be incorporated in the 5' end of the 
downstream amplification primer for the downstream domain. The resulting PCR 
product can then be restricted with the respective endonuclease(s) for subsequent 
ligation into a vector that has complementary sequence(s) to the PCR products. 

20 Alternatively the same restriction site can be used, and the subclones can be screened by 

DNA sequencing, PCR, restriction enzyme digestion, etc., to determine if the correct 
orientation has been achieved. 

Ligation of the PCR products 

The 3' end of the upstream PCR product and the 5' end of the downstream PCR 
25 product can be ligated to one another (Methods in Enzymology: Guide to Molecular 

Cloning Techniques, Berger et al, eds, 1987)). If both ends of these products are blunt, 
the 5' phosphates can be phosphorylated by T4 polynucleotide kinase and the reaction 
products ligated with T4 DNA ligase. If the ends of the PCR products are 
complementary or can be made complementary through restriction endonuclease 
30 digestion, then a sticky end ligation can be performed wherein the complementary ends 
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are ligated with T4 DNA ligase. Likewise the 5' end of the upstream PCR product 
and/or the 3' end of the downstream PCR product can be ligated to a restricted vector in 
a blunt end or a sticky end ligation. 

To increase the sequence and length complexity of the linker region of the 
5 population of dual-domain molecules, such as an scFv, multiple PCR reaction products 

of Dj and D 2 can be combined. For example, a PCR reaction of Di and/or D 2 where the 
degenerate triplet is repeated six times can be combined with PCR reactions of the Di 
and/or D 2 where the degenerate triplet is repeated nine times and ligated into the 
appropriate vector. The combination of the PCR products will increase the length and 

10 sequence complexity observed in the L region. 

The complexity of the linker sequences obtained in the population or "library" 
can be pre-determined by the number of different amino acids designed into the 
nontemplate sequence of the PCR amplification primers used to amplify the domains. 
The number of amino acids encoded by the nontemplated sequence is determined by the 

15 nucleotide degeneracy designed into each codon triplet. 

In one example, the desired complexity of the linker sequence present in a 
library is limited to two amino acids, Ala and Gly. The nontemplated sequence 
preferred for this linker combination would be repeats of the codon triplet gst 
(= get and ggt), where get encodes Ala'and ggt encodes Gly. 

20 In a second example, the desired complexity of the linker sequence present in a 

library is increased to six amino acids, Ala, Gly, Ser, Thr, Lys and Asp. The 
nontemplated sequence preferred for this linker combination would be repeats of the 
codon triplet rvt (=gct, ggt, agt, act, aat and gat), wherein the 
following amino acids are encoded: 

get Ala ggt-Gly aat-Lys 

agt-Ser act-Thr gat-Asp 

25 

The same approaches are used to generate multi-domain polypeptides of higher 
order, e.g., three- or four-domain polypeptides. These can comprise all different 
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domains or one or more domains can be repeated. General structures for such molecules 
is as follows (where D is a polypeptide domain and L is a linker): 



D1-L1-D1 


D1-L1-D2 


D1-L1-D2-L2-D2 


Di-L,-D 2 -L 2 -D 3 


D 1 -Li -D2-L2-D3-L3-D4 


etc. 



The different linkers between the various domains can vary in complexity. This will 
5 depend on the structural relationship required for the proper function of each domain for 

its intended purpose. Thus, in the example of an scFv molecule with a single idiotype or 
with a single ligand-binding specificity, the two domains must function in concert for 
proper binding. In a 3 -domain polypeptide which is an scFv of desired binding 
specificity wherein the third domain D 3 is a toxin, there are fewer constraints on the 
1 0 "interaction" between the toxin domain and either of the two binding domains. In that 

case, the linker L2 between one of the scFv domains and the toxin domain can be 
different, less complex than the linker Li between the two domains (Di and D2) that 
comprise the scFv polypeptide. 

In a library of multi-domain polypeptides, not every pair of domains is 
1 5 necessarily be joined by a linker according to the present invention. Thus, two or more 

adjacent domains may be (1) linked directly as may occur in their native state (if they 
are derived from naturally dual- or multi-domain proteins), or (2) linked by a 
"conventional" linker well-known in the art. In yet another embodiment, a particular 
linker identified using the present invention and derived as a member of a random linker 
20 library may be a preferred choice for use as a non-random linker between two given 

domains in a multi-domain polypeptide. These various embodiments can be depicted in 
the following (non-limiting) manner: 

D1-L1-D2-D3 
D1-L1-D2-D3-D4 
25 D,-L r D2-D3-D4-D 5 

D1-L1-D2-D3-D4- L 2 -D 5 
etc. 
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In the four formulas shown above, Li and L 2 indicate random linker members of the 
libraries of the present invention. All other domains shown bonded to adjacent domains 
without a linking L may be (1) directly bonded to one another as described above; (2) 
linked by a conventional linker known in the art; or (3) linked by a fixed linker 
5 discovered in a random linker library according to this invention but inserted as a 

predetermined, non-random, non-varying linker in the particular location. As noted in 
the Summary section, above a multi-domain polypeptide herein may be composed of up 
to about 20 domains. For example, a 10-domain polypeptide may have anywhere 
between 1 and 9 linkers L according to this invention. If a 10 domain polypeptide has 
10 on one such linker Li linking two domains, the other 8 domains are either directly 

bonded to one another or linked by conventional or other predetermined linker groups. 

Expression System for Production of the Dual-domain Polypeptide 

A number of well-known heterologous expression systems in bacterial, insect, 
mammalian and plant were discussed above, each with its advantages and 

15 disadvantages. The present invention is particularly suited for plant expression. 

A number of transformation methods permit expression of heterologous proteins 
in plants. Some involve the construction of a transgenic plant by integrating DNA 
sequences encoding the protein of interest into the plant genome. The time it takes to 
obtain transgenic plants may be too long for the rapid production certain embodiments 

20 such as a tumor vaccine polypeptide. An attractive solution (an alternative to such 

stable transformation) is transient transfection of plants with expression vectors. Both 
viral and non- viral vectors capable of such transient expression are available (Kumagai, 
M.H. etal. (1993) Proc. Nat. Acad. Sci. USA 90:427-430; Shivprasad, S. et al. (1999) 
Virology 255:312-323; Turpen, T.H. et al (1995) BioTechnology 13:53-57; Pietrzak, M. 

25 et al. (1986) Nucleic Acid Re. 14:5857-5868; Hooykaas, P.J.J, and Schilperoort, R.A. 

(1992) Plant Mol. Biol. 19:15-38), although viral vectors are easier to introduce into 
host cells, spread by infection to amplify the expression and are therefore preferred. 

Chimeric genes, vectors and recombinant viral nucleic acids of this invention are 
constructed using conventional techniques of molecular biology. A viral vector that 

30 expresses heterologous proteins in plants preferably includes (1) a native viral 
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subgenomic promoter (Dawson, W.O. et al. (\9&&)Phytopathology 78:783-789 and 
French, R. et al. (1986) Science 231:1294-1297), (2) preferably, one or more non-native 
viral subgenomic promoters (Donson, J. et al. (1991) Proc. Nat. Acad. Sci. USA 
88:7204-7208 and Kumagai, M.H. et al. (1993) Proc. Nat. Acad. Sci. USA 90:427-430), 
5 (3) a sequence encoding viral coat protein (native or not), and (4) nucleic acid encoding 

the desired heterologous protein. Vectors that include only non-native subgenomic 
promoters may also be used. The minimal requirement for the present vector is the 
combination of a replicase gene and the coding sequence that is to be expressed, driven 
by a native or non-native subgenomic promoter. The viral replicase is expressed from 

10 the viral genome and is required to replicate extrachromosomally. The subgenomic 

promoters allow the expression of the foreign or heterologous coding sequence and any 
other useful genes such as those encoding viral proteins that facilitate viral replication, 
proteins required for movement, capsid proteins, etc. The viral vectors are encapsidated 
by the encoded viral coat proteins, yielding a recombinant plant virus. This 

15 recombinant virus is used to infect appropriate host plants. The recombinant viral 

nucleic acid can thus replicate, spread systemically in the host plant and direct RNA and 
protein synthesis to yield the desired heterologous protein in the plant. In addition, the 
recombinant vector maintains the non-viral heterologous coding sequence and control 
elements for periods sufficient for desired expression of this coding sequence. 

20 The recombinant viral nucleic acid is prepared from the nucleic acid of any 

suitable plant virus, though members of the tobamovirus family are preferred. The 
native viral nucleotide sequences may be modified by known techniques providing that 
the necessary biological functions of the viral nucleic acid (replication, transcription, 
etc.) are preserved. As noted, one or more subgenomic promoters may be inserted. 

25 These are capable of regulating expression of the adjacent heterologous coding 

sequences in infected or transfected plant host. Native viral coat protein may be 
encoded by this RNA, or this coat protein sequence may be deleted and replaced by a 
sequence encoding a coat protein of a different plant virus ("non-native" or "foreign 
viral"). A foreign viral coat protein gene may be placed under the control of either a 

30 native or a non-native subgenomic promoter. The foreign viral coat protein should be 
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capable of encapsidating the recombinant viral nucleic acid to produce functional, 
Oinfectious virions. In a preferred embodiment, the coat protein is foreign viral coat 
protein encoded by a nucleic acid sequence that is placed adjacent to either a native viral 
promoter or a non-native subgenomic promoter. Preferably, the nucleic acid encoding 
5 the heterologous protein, e.g., an immunogenic polypeptide to be expressed in the plant, 

is placed under the control of a native subgenomic promoter. 

An important element of this invention, that is responsible in part for the proper 
folding and copious production of the heterologous protein (exemplified as the 
immunogenic scFv polypeptide), is the presence of a signal peptide sequence that directs 

10 the newly synthesized protein to the plant secretory pathway. The sequence encoding 

the signal peptide is fused in frame with the DNA encoding the polypeptide to be 
expressed. A preferred signal peptide is the a-amylase signal peptide. 

In another embodiment, a sequence encoding a movement protein is also 
incorporated into the viral vector because movement proteins promote rapid cell-to-cell 

15 movement of the virus in the plant, facilitating systemic infection of the entire plant. 

Either RNA or DNA plant viruses are suitable for use as expression vectors. The 
DNA or RNA may be single- or double-stranded. Single-stranded RNA viruses 
preferably may have a plus strand, though a minus strand RNA virus is also intended. 
The recombinant viral nucleic acid is prepared by cloning in an appropriate 

20 production cell. Conventional cloning techniques (for both DNA and RNA) are well 

known. For example, with a DNA virus, an origin of replication compatible with the 
production cell may be spliced to the viral DNA. 

With an RNA virus, a full-length DNA copy of the viral genome is first prepared by 
conventional procedures: for example, the viral RNA is reverse transcribed to form 

25 +subgenomic pieces of DNA which are rendered double-stranded using DNA polymerases. 

The DNA is cloned into an appropriate vector and inserted into a production cell. The 
DNA pieces are mapped and combined in proper sequence to produce a full-length DNA 
copy of the viral genome. Subgenomic promoter sequences (DNA) with or without a coat 
protein gene, are inserted into nonessential sites of the viral nucleic acid as described 

30 herein. Non-essential sites are those that do not affect the biological properties of the viral 
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nucleic acid or the assembled plant virion. cDNA complementary to the viral RNA is 
placed under control of a suitable promoter so that (recombinant) viral RNA is produced in 
the production cell. If the RNA must be capped for infectivity, this is done by conventional 
techniques. 

5 Examples of suitable promoters include the lac, lacuvS, trp, tac, Ipl and ompF 

promoters. A preferred promoter is the phage SP6 promoter or T 7 RNA polymerase 
promoter. 

Production cells can be prokaryotic or eukaryotic and include Escherichia coli, 
yeast, plant and mammalian cells. 
10 Numerous plant viral vectors are available and well known in the art (Grierson, 

D. et al. (1984) Plant Molecular Biology, Blackie, London, pp. 126- 146; Gluzman, Y. et 
al. (1988 ) Communications in Molecular Biology: Viral Vectors, Cold Spring Harbor 
Laboratory, New York, pp. 172-189). The viral vector and its control elements must 
obviously be compatible with the plant host to be infected. Suitable viruses are 
15 (a) those from the tobacco mosaic virus (TMV) group, such as TMV, tobacco mild 

green mosaic virus (TMGMV), cowpea mosaic virus (CMV), alfalfa mosaic virus 
(AMV), Cucumber green mottle mosaic virus - watermelon strain (CGMMV-W), 
oat mosaic virus (OMV), 

(b) viruses from the brome mosaic virus (BMV) group, such as BMV, broad bean 
20 mottle virus and cowpea chlorotic mottle virus, 

(c) other viruses such as rice necrosis virus (RNV), geminiviruses such as Tomato 
Golden Mosaic virus (TGMV), Cassava Latent virus (CLV) and Maize Streak virus 
(MSV). 

A preferred host is Nicotiana benthamiana. The host plant, as the term is used 
25 here, may be a whole plant, a plant cell, a leaf, a root shoot, a flower or any other plant 

part. The plant or plant cell is grown using conventional methods. 

A preferred viral vector for use with N. benthamiana is expression vector 
pBSG1250 (pTTOSA derivative) containing a hybrid fusion of TMV and tomato mosaic 
virus (ToMV) (Kumagai, MH. et al. (1995) Proc. Natl. Acad. Sci. USA 92:1679-1683). 
30 The inserted subgenomic promoters must be compatible with TMV nucleic acid and 
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capable of directing transcription of properly situated (e.g., adjacent) nucleic acids 
sequences in the infected plant. The coat protein should permit the virus to systemically 
infect the plant host. TMV coat protein promotes systemic infection of N. 
benthamiana. 

5 Infection of the plant with the recombinant viral vector is accomplished using a 

number of conventional techniques known to promote infection. These include, but are 
not limited to, leaf abrasion, abrasion in solution and high velocity water spray. The 
viral vector can be delivered by hand, mechanically or by high pressure spray of single 
leaves. 

10 Purification of the Protein/Polvpeptide Product 

The dual-domain polypeptide produced in plants is preferably recovered and 
purified using standard techniques. Suitable methods include homogenizing or grinding 
the plant or the producing plant parts in liquid nitrogen followed by extraction of 
protein. If for some reason it is not desirable to homogenize the plant material, the 

15 polypeptide can be removed by vacuum infiltration and centrifugation followed by 

sterile filtration. Protein yield may be estimated by any acceptable technique. 
Polypeptides are purified according to size, isoelectric point or other physical property. 
Following isolation of the total secreted proteins from the plant material, further 
purification steps may be performed. Immunological methods such as 

20 immunoprecipitation or, preferably, affinity chromatography, with antibodies specific 

for epitopes of the desired polypeptide may be used. 

To facilitate purification, the viral vector can be engineered so that the protein is 
produced with an affinity tag that can be exploited at the purification stage. An 
examples of such a tag is the histidine (His) tag that permits purification on a metal 

25 (e.g., nickel) affinity column. Other affinity tags are well-known in the art and need not 

be described here. 

Various solid supports may be used in the present methods: agarose®, 
Sephadex®, derivatives of cellulose or other polymers. For example, staphylococcal 
protein A (or protein L) immobilized to Sepharose® can be used to isolate the target 

30 protein by first incubating the protein with specific antibodies in solution and contacting 
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the mixture with the immobilized protein A which binds and retains the antibody-target 
protein complex. 

Using any of the foregoing or other well-known methods, the polypeptide is 
purified from the plant material to a purity of greater than about 50%, more preferably 
5 greater than about 75%, even more preferably greater than about 95%. 

Determination of Correct Folding 

Critical for certain properties such as immunogenicity is the protein's 
conformation in solution. The conformation of the relevant epitopes of the dual-domain 
polypeptide in solution preferably resemble or mimic the same epitopes of the native 

10 protein. By producing polypeptides in plants, and targeting them to the plant's secretory 

pathway, the present invention insures that the polypeptide is secreted in soluble, 
optimally folded, form. 

A preferred reagent to be used in determining proper folding is a specific 
antibody, preferably a mAb, which (1) binds to an epitope of the polypeptide when the 

1 5 chains are correctly folded but (2) does not bind when the epitopes are denatured. The 

antibody is employed in any of a number of immunological assays, including dot blot, 
western blot, immunoprecipitation, radioimmunoassay (RIA), and enzyme 
immunoassays (EIA) such as an enzyme-linked immunosorbent assays (ELISA). In 
preferred embodiments, when such antibodies are available, Western blots and ELISAs 

20 are employed to verify correct folding of the relevant parts of the dual domain (or multi- 

domain) polypeptide produced in the plant. 

Additional Analysis of the Dual-Domain Molecule 

DNA encoding the dual domain polypeptide can be sequenced, yielding a 
deduced amino acid sequence of its encoded product. If the DNA molecule has been 
25 subcloned, it can be excised from the vector with a restriction enzyme and the resulting 

fragments analyzed on agarose gels to determine the size of the fragments. 

If the DNA molecule itself has the binding domains of interest, the subcloned 
DNA molecule (or excised fragment) can be assayed for binding to the relevant ligand. 
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If the DNA molecule encodes a dual-domain ribozyme, then the ribozyme RNA 
can be transcribed from the vector. The coding sequence can be excised with restriction 
enzymes and contacted with an RNA polymerase (along with ribonucleotides and other 
required factors) to transcribe the dual-domain RNA. The ribozyme can then be 
quantified and its enzymatic activity measured in an appropriate assay. 

A DNA molecule encoding a dual-domain polypeptide is first expressed. If 
desired, the DNA can be additionally modified to include sequences that will permit or 
optimize expression in an appropriate host or in an in vitro transcription/translation 
system. Once expressed, the polypeptide is then subjected to appropriate functional 
assays, e.g., measurement of enzymatic activity (of either domain). Also the quantity 
and physical properties of the dual domain polypeptide can be determined, e.g., by SDS- 
PAGE. Electrophoretic separation can be followed by direct staining of protein or by 
Western blotting and probing with an appropriate antibody that recognizes an epitope of 
either domain. If a domain has binding activity, or other functions as have been 
described above, this can also be measured by conventional means. 

Having now generally described the invention, the same will be more readily 
understood through reference to the following examples which are provided by way of 
illustration, and are not intended to be limiting of the present invention, unless specified. 

The following examples are provided by way of illustration only and not by way 
of limitation. Those of skill will readily recognize a variety of noncritical parameters 
which could be changed or modified to yield essentially similar results. 

EXAMPLE 1 

Generation of a SelfTTumor Antigen from a Single Patient (CJ) that Includes 
the Idiotvpe of CJ B Cell Lymphoma 

The immunogenic^scJv protein designated "CJ" was derived from human 
lymphoma patient (having the mkials CJ) and had as its linker (Gly4Ser) 3 . Patient CJ 
had been treated in an earlier passive immunotherapy trial. The CJ molecule 
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(specifically, its vTegionepitope or epitopes) is recognized by an anti-Id mAb named 
7D1 1 . See, also; McCorrnicTc^A^ al, Proc Natl Acad Sci USA (1999) 96:703-708). 

In an initial attempt to make a human scFv polypeptide, CJ V region genes were 
sequenced and clonecHnto a bacterial expression system using a (Gly3Ser)4 linker. 
Although targeted to the periplasm with a PEL-b leader, CJ scFv protein was 
sequestered in insoluble inclusion bodies. When mice were immunized with CJ scFv 
made in bacteria, no anti-CJ anti-idiotype antibody responses were detected. 

Derivatives of CJ were generated by producing linkers having random length 
and sequence that was part of general PCR based cloning strategy described herein. 

Four reactions were carried out. In the first and second, the sequence encoding 
the Vh domain was amplified from a cDNA clone of the lymphoma cells from patient 
CJ using the following synthetic oligonucleotides: 

V H F: 5' gtg gca tgc agg ttc aac tgg tgg agt ctg (SEQ ID NO:4) 
V H R: 5' (asy) x tga gga gac ggt gac cag ggt tc (SEQIDNO:5) 
The SphI restriction site is underscored. In the first reaction x was 6: 
asy asy asy asy asy asy tga gga gac ggt gac cag ggt tc (SEQ ID NO:6) 
In the second reaction, x was 9, giving SEQ ID NO: 7: 

asy asy asy asy asy asy asy asy asy tga gga gac ggt gac cag ggt tc 

(In general, the number of triplets (x) can be 1 to about 50) 

In the third and fourth PCR reactions, the sequence encoding the V L domain was 

amplified from a cDNA clone of CJ using the following synthetic oligonucleotides: 

V L F: 5' (rst) z gac att cag atg acc cag tct cct tc (SEQIDNO:8 

V L R: 5' cac cct agg eta teg ttt gat cag tac ctt ggt ccc ctg 

(SEQ ID NO: 9) 

The Avrll site is underscored. In the third reaction z was 6: 

rst rst rst rst rst rst gac att cag atg acc cag tct cct tc (SEQIDNO:10) 
In the fourth reaction, z was 9 (SEQ ID NO:l 1): 

rst rst rst rst rst rst rst rst rst gac att cag atg acc cag tct cct tc 
(In general, the number of triplets (z) can be 1 to about 50.) 
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Following amplification, the four PCR products were purified and digested with 
SphI for the V H chain PCR product and Avrll for the V L chain PCR product. The 
digests were electrophoresed on an agarose gels and the four digested PCR fragments 
were purified, combined and ligated into a Geneware® expression vector pBSG1250 
5 (pTTOSA derivative) containing a hybrid fusion of TMV and ToMV (Kumagai, et al, 

supra) that had been digested with the restriction enzymes SphI and Avrll. In the 
particular Geneware® vector, the SphI site lies downstream of the TMV Ul CP 
subgenomic promoter and the a amylase signal peptide sequence. The SphI site in the 
primer V H F is in- frame with the SphI site in the a amylase signal peptide sequence. 

10 After ligation of both the V H and V L PCR fragments into the Geneware® vector, the 

DNA was treated with polynucleotide kinase and ATP to incorporate phosphates at the 
blunt 5' ends of the initial PCR products. 

Following the kinase reaction, the DNA was ligated back upon itself, to generate 
circular plasmids. The ligated DNA was transformed into E. coli (using electropor- 

15 ation), and the transformed cells were plated on selective media containing 50 ug/ml 

ampicillin. Plasmid DNA was purified from individual ampicillin-resistant E. coli 
colonies and transcribed with T7 RNA polymerase to generate infectious transcripts of 
individual clones. 

Transcripts were transfected into N. tobacum plant protoplasts using a 

20 PEG-based transfection protocol essentially as described in Lindbo et al., Plant Cell 

5:1749-1759 (1993), and transfected protoplasts were incubated in protoplast culture 
medium for several days. The latter medium contained 265 mM mannitol, IX 
Murashige minimal organics medium (Gibco/BRL), 1.5 mM KH2PO4, 0.2 ug/ml 
2,4-dichlorophenoxyacetic acid, 0.1 ug/ml kinetin, and 5% coconut water (Sigma). 

25 Protoplasts were cultured at a density of about 10 6 cells/ml. Plasmid DNA was purified 

from at least 10 to 50 individual colonies from each cloning experiment. 

Approximately 1-4 days after transfection, protein samples were collected from 
the individual protoplast samples. Culture medium (200-500 ul) was concentrated about 
10-fold by speed vacuum evaporation or Microcon sample concentrator. 
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Since this cloning strategy included a signal peptide sequence designed to 
promote secretion of the protein product by the plant cells into the culture medium, 
medium samples were also analyzed by SDS-PAGE followed by Coomassie blue 
staining and/or by Western blotting. 

TJie starting scFv incorporated the standard (Gly 4 -Ser) 3 linker sequence; the 
other scFvfehains were randomly selected from the transformants obtained from the 
linker library cloning experiment that utilized the cloned PCR products generated from 
the four primers (SfeQ ID NO:4-l 1, above). Culture supernatants from equivalent 
numbers of cells were dlectrophoresed (SDS-PAGE), and the gels were transferred to 
nitrocellulose membranes rar Western analysis with mAb 7D1 1 (see above). 

Some selected linker library members that were screened randomly appeared to 
express and accumulate as much c\more CJ protein as did the CJ scFv having the 
conventional linker (Gly4-Ser) 3 . \ 

DNA of those library members expressing particularly high amounts of CJ scFv 
was sequenced. Results are shown in Table V Plasmid DNAs for select clones were 
prepared and sequenced by standard methods. From the nucleotide sequences of the 
various CJ-derived constructs, the linker sequence bf individual clones was deduced. 
Table 2 lists some of the nucleotide and amino acid linker sequences obtained and 
indicates "relative expression" which means the amount of expression relative to the 
same protein but with the (Gly 4 Ser) 3 linker. \ 



DC043399 



38 



Dkt: LSB-006 



Table 2 

Analysis of select members of the CJ linker library experiment in plant protoplasts 



Clone 


Linker Region Nucleotide Sequence (lower case) and 
Amino Acid Sequence (upper case) 


SEQ 
ID 
Kin- 


Length 
(aa) 


RE* 


#24 


actactgctactggtgctagtactactgctggtgctagt 
TTATGASTTAGAS 


12 

13 


13 aa 


++ 


#36 


Gctactgctgctagtggtgctgctgctggtggtggtact 
ATAASGAAAGGGT 


14 
15 


13 aa 


+ 


#37 


Gctactggtgctagtactagtgctactgctggtggtagt 
ATGASTSATAGGS 


16 
17 


13 aa 


++ 


#20 


Agtactgctgctggtactagtagtggtagtagtactggt 
STAAGTSSGSSTG 


18 
19 


13 aa 


++ 


#12 


Gctagtactgctactagtagtggtggtggtggtactggtagtagtgctgct 
ASTATSSGGGTGSSAAA 


20 
21 


17 aa 


+ 


#16 


Gctactagtactgctgctgctggtgctactagtgctactggtggtgctagtggtactggt 
ATSTAAAGATSATGGASGTG 


22 
23 


20 aa 


+++ 


#30 


Actggtgctagtggtgctactagtagtggtagtagtagt 
TGASGATSSGSSS 


24 
25 


13 aa 


+++ 




10 




* RE = Relative Expression to the (GlySer)i clone 

DNA sequencing revealed that the clones did not have the same nucleotide or 
amino acid sequences but rather, demonstrated amino acid and nucleotide length 
diversity. Table 2 shows a sampling of clones with L's ranging from 13 to 20 amino 
acids. This range was apparently a result of mispriming during PCR amplification of 
the Vh and Vl coding sequences. Since the linker coding sequences of the 
oligonucleotides used in this experiment contain stretches of low complexity nucleotide 
sequences (i.e., asy x or rst z and), multiple mispriming events are likely. In 
conjunction with DNA polymerase/exonuclease activities present during PCR, this 
could lead to an increase or a decrease in the number of codons comprising the L 
sequences. 

The quantities of cNcFv protein produced also varied (relative to the CJ scFv 
with the (Gly 4 Ser) 3 linker). ThisSndicates that both the length and the sequence of the 
linker region affects the amount of protein produced by the plant cells or plants. 
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EXAMPLE 2 
Expression of scFv Product in Whole Plants 



The process described in Example 1 is repeated except that whole plants are used 
along with a suitable expression system for producing the scFv products. 
5 Expressed products are screened by SDS-PAGE/Coomassie blue staining and/or 

Western blotting. The results indicate a varied amount of scFv product produced. The 
highest yielding clones are selected for production of the vaccine scFv. 
Expression system 

The DNA fragments encoding the dual-domain scFv fragments having the V 

10 regions of the CJ human lymphoma were generated as in Example 1 and cloned into 

vector pBSG1250. In this vector, a TMV coat protein subgenomic promoter is located 
upstream of the insertion site of the CJ sequence. Following infection, this TMV coat 
protein subgenomic promoter directs initiation of the CJ RNA synthesis in plant cells at 
the transcription start point ("tsp"). The rice a amylase signal peptide (O'Neill, SD et 

15 al. (1990) Mol. Gen. Genet. 221 :235-244), fused in-frame to the CJ sequence, encodes a 

31 residue polypeptide which targets proteins to the secretory pathway (Firek, S. et al. 
(1994) Transgenic Res. 3:326-331), and is subsequently cleaved off between the C- 
terminal Gly of the signal peptide and the N-terminal Met of the expressed CJ scFv 
protein. The sequence encoding CJ scFv has been introduced between the 3 OK 

20 movement protein and the ToMV coat protein (Tcp) genes. An T7 phage promoter has 

been introduced upstream of the viral cDNA, allowing for transcription of infective 
genomic plus-strand RNA. 

Capped infectious RNA was made in vitro from 1 ug plasmid, using a T7 
message kit from Ambion. Synthesis of the message was quantified by gel 

25 electrophoresis and approximately 2 ug of the in vitro transcribed viral RNA was 

applied with an abrasive to the lower leaves (approximately 1-2 cm in size) of N. 
benthamiana (Dawson, WO et al. (1986) Proc. Natl. Acad. Sci. USA 83:1832-1836). 
Transcription of subgenomic RNA encoding the CJ scFv protein was initiated after 
infection at the indicated transcription start point. High levels of subgenomic RNA 
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species were synthesized in virus-infected plant cells (Kumagai, MH. et al. (1993) Proc. 
Natl. Acad. Sci. USA 90:427-430), and serve as templates for the translation and 
subsequent accumulation of CJ scFv protein. 
Characterization of clones 

Signs of infection were visible after 5-6 days as mild leaf deformation, with 
some variable leaf mottling and growth retardation. Eleven to fourteen days post 
inoculation, the secreted proteins were isolated. Leaf and stem material was harvested, 
weighed and then subjected to a 700 mm Hg vacuum for 2 min in infiltration buffer 
(lOOmM Tris HC1, pH 7.5 and 2mM EDTA). Secreted proteins (hereafter termed 
"interstitial fraction" or "IF") were recovered from infiltrated leaves by mild 
centrifugation at 2000g (Beckman JA-14) on supported nylon mesh discs, concentrated 
approximately 10-fold in Centricon-10 (Amicon) concentrators. Total protein was 
measured by the Bradford method (Bradford, M. (1976) Anal. Biochem. 72:248-254) 
and stored at -80°C until used. 

The secreted material was analyzed for the presence of soluble CJ scFv protein 
by the SDS-PAGE followed by Western blot with CJ mAb 7D1 1 . About 3 jxg of IF 
protein were separated by SDS-PAGE and transferred to nitrocellulose membrane in 
standard Tris-glycine buffer with 20% methanol at 150V for 1 hour. After transfer, 
blots were treated for 20 minutes at room temperature with blocking buffer (50 mM Tris 
pH 8, 150mM NaCl, ImM EDTA, 2.5% non-fat dry milk, 2.5% BSA and 0.05% Tween 
20) followed by a 16 hr incubation at 4°C in blocking buffer plus 1 ng/ml purified 7D1 1 
antibody. After three 15 minute washes (100 mM Tris pH 8, 150 mM NaCl, 1 mM 
EDTA and 0. 1 % Tween 20), membranes were incubated for 1 hour in blocking buffer 
plus 1 ug/ml goat anti-mouse IgG-HRP (Southern Biotechnology). After three 15 
minute washes, Western blots were developed by Enhanced Chemiluminescence (ECL) 
(Amersham) according to manufacturers instructions. Exposure times ranged from 1 to 
5 seconds. No cross reactivity to plant proteins was observed (testing IF extracts from 
control infected plants). 

Individual clones were seque'Rced, analyzed for reading frame and amino acid 
identity to the original CJ Ig sequence anothen screened for protein expression in 
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infected plants. Figure 1 shows the results of 9 individual CJ scFv expressing clones 
that demonstrated various levels of protein accumulation. Clones 20 and 30 showed 
high levels of expression, as>*4ell as accumulation of protein dimers. Clone C contained 
a modification of the (Gly3Ser) 4 linker^ 

From tfie^seauence data, the linker sequences for individual clones were 
deduced. The clone numbecsin Table 3 are the same as those listed in Table 2. As 
above, relative expression relates*fbShe scFv protein having (Gly4Ser)3 linker. 

As above, differences were observed in the expression of various CJ scFv-based 
clones in whole plants. Interestingly, some clones that were expressed in plant 
protoplasts were not expressed in whole plants. For example, clone #16 which was 
strongly expressed in plant protoplasts was apparently not expressed in whole plants. 
Nevertheless, the methods disclosed for generating the linker regions with varying 
length and sequence permit the screening of large numbers of clones for their expression 
in either plant protoplast or whole plants. 



Table 3. 

Analysis of select members of the CJ linker library experiment in whole plants 



Clone 


Linker Region Nucleotide Sequence (lower case) and 
Amino Acid Sequence (upper case) 


SEQ 
ID 
NO: 


Length 


RE 
* 


#24 


actactgctactggtgctagtactactgctggtgctagt 
TTATGASTTAGAS 


12 
13 


13aa 


++ 


#36 


gctactgctgctagtggtgctgctgctggtggtggtact 
ATAAS GAAAGGGT 


14 
15 


13aa 


+ 


#37 


gctactggtgctagtactagtgctactgctggtggtagt 
ATGASTSATAGGS 


16 
17 


13aa 


-H- 


#20 


Agtactgctgctggtactagtagtggtagtagtactggt 
STAAGTSSGSSTG 


18 
19 


13aa 


-H- 


#12 


Gctagtactgctactagtagtggtggtggtggtactggtagtagtgctgct 
ASTATSSGGGTGSSAAA 


20 
21 


17aa 


+ 


#30 


Actggtgctagtggtgctactagtagtggtagtagtagt 
TGASGATSSGSSS 


24 
25 


13aa 


+++ 



* RE — Relative ExpresSum to the (Gly 4 Ser) 3 clone 
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The quality of CJ protein, optimized by the random linker library, was validated 
by two methods. First, CJ protein was purified by affinity chromatography using 
immobilized 7D1 1 anti-idiotype mAb. This method requires that the CJ protein bind to 
the anti-Id column under physiological conditions. Such binding will not occur if the 
protein is not folded correctly. Protein was bound under normal pH and was eluted by 
50mM diethylamine pH 1 1.5, then immediately dialyzed against normal saline. 
Material was quantitated by ELISA using 7D1 land using standard protein 
determination. 

The second, more stringent, assay for the quality of the CJ protein was a 
functional assay in animals. Clone CJLL20 (for Linker Library pick #20) was purified 
by 7D1 1 affinity chromatography, administered to five mice in 3 bi-weekly 
immunizations of 30^g each. Ten days after the third injection, serum was sampled. 
Using the native idiotype (ID 12), or an isotype-matched irrelevant human antibody in a 
sandwich ELISA, the sera were tested for specific responses to the CJ idiotype. Results 
are shown in Figure 2. 

Non-specifically antibody responses to xenogeneic human Ig determinants were 
present in only 3 of the 5 animals and in very low amounts (detected as minimal cross- 
reactivity of the murine sera to an unrelated human antibody). 

The sera of all 5 mice had high titers of anti-CJ antibodies (Figure 2). Thus, the 
immune response induced by the dual-domain scFv polypeptide was highly specific for 
the original V H and V L domains of the original Ig, as predicted and as desired. These 
results suggested that the protein produced in plants was folded correctly so that it could 
induce an appropriate immune response when administered to subjects. 

EXAMPLE 3 
Expression of scFv Product in Whole Plants 

The process described in Example 2 was repeated except that a different human 
scFv with unknown expression characteristics was used along with a suitable expression 
system for producing the scFv products. 
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Expressed products were screened by SDS-PAGE/Coomassie blue staining. The 
results indicated that the amount of scFv product produced varied based on linker 
composition. The highest yielding clones are selected for production of a vaccine scFv. 

Expression system 

5 The DNA fragments encoding the dual-domain scFv fragments having the V 

regions of the Go 19 human lymphoma were generated as in Example 1 and cloned into 
pl324-MBP, a modified 30B vector (Shivprasad, S. et al. (1999) Virology 255:312- 
323), containing a hybrid fusion of TMV and TMGMV-U5 as well as the rice a amylase 
signal peptide with Sph I and Avr II insert cloning sites. 

10 In this vector, a TMV coat protein subgenomic promoter is located upstream of 

the insertion site of the Go 19 sequence. Following infection, this TMV coat protein 
subgenomic promoter directs initiation of Go 19 RNA synthesis in plant cells at the 
transcription start point ("tsp"). The rice a amylase signal peptide (O'Neill, SD et al. 
(1990) Mol. Gen. Genet. 221:235-244), fused in-frame to the Gol9 sequence, encodes a 

15 31 residue polypeptide which targets proteins to the secretory pathway (Firek, S. et al. 

(1994) Transgenic Res. 3:326-33 1), and is subsequently cleaved off between the 
C-terminal Gly of the signal peptide and the N-terminal Met of the expressed Go 19 scFv 
protein. The sequence encoding Gol9 scFv was been introduced between the 30K 
movement protein and the TMGMV-U5 coat protein (Tcp) genes. A T7 phage RNA 

20 polymerase promoter was introduced upstream of the viral cDNA, allowing for 

transcription of infective genomic plus-strand RNA. 

The Gol9 V regions were amplified in four separate PCR reactions. In the first 
and second reactions, the sequence encoding the V H domain was amplified from a 
cDNA clone derived from the lymphoma cells of patient Go 19 using the following 

25 synthetic oligonucleotides: 

V H F: 5 ' cct eca tec tgg agg tgc agt tgg tgg aat c (SEQ ID NO:26 

V H R: 5' (asy) x aga gga gac ggt gac cat ga (SEQ ID NO:27 

The SphI restriction site is underscored above. In the first reaction x was 4: 
5'-asy asy asy asy aga gga gac ggt gac cat ga (SEQ ID NO:28) 

30 In the second reaction, x was 9 (SEQ ID NO:29): 
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5'-asy asy asy asy asy asy asy asy asy aga gga gac ggt gac cat ga 
(In general, the number of triplets (x) can be 1 to about 50) 

In the third and fourth PCR reactions, the sequence encoding the V L domain was 
amplified from a cDNA clone of Go 19 using the following synthetic oligonucleotides: 
5 V L F: 5' ( rst) z cag tct gcc ctg act cag t (SEQ ID NO:30) 

V L R: 5' cac cct agg tea acc aag gac ggt cag gtt ggt c (SEQ ID NO:31) 

The Avr II restriction site is underscored above. In the first reaction, z was 6: 

5'-rst rst rst rst rst rst cag tct gcc ctg act cag t (SEQIDNO:32) 
In the second reaction, z was 9, giving SEQ ID NO:33: 

10 5'-rst rst rst rst rst rst rst rst rst cag tct gcc ctg act cag t 

(In general, the number of triplets (z) can be 1 to about 50) 

Prior to PCR amplification, the V H R and V L R oligonucleotides were treated with 
polynucleotide kinase and ATP to add phosphates at the 5' end of the oligonucleotides. 
Following amplification, the four PCR products are purified and the V H and V L products 

15 are ligated together to create the scFv. The scFv ligation products are re-purified, 

restriction digested with SphI and Avr II and the digested scFv is gel isolated and ligated 
into the Geneware® vector. The ligated DNA was transformed into E. coli (using 
electroporation), and the transformed cells were plated on selective media containing 50 
ug/ml ampicillin. Plasmid DNA was purified from individual ampicillin-resistant E. 

20 coli colonies. 

Capped infectious RNA was made in vitro from approximately 0.5 ug plasmid, 
using an T7 message kit from Ambion. Synthesis of the message was evaluated by gel 
electrophoresis, and approximately 2 ug of the in vitro transcribed viral RNA was 
encapsidated with purified TMV-U1 coat protein in lOOmM sodium phosphate, pH 7.0 

25 at room temperature for a minimum of 6 hours. Encapsidated transcripts are applied 

with an abrasive to the lower leaves (approximately 1-2 cm in size) of N. benthamiana 
(W.O. Dawson et al. (1986) Proc. Natl. Acad Sci. USA 83: 1 832-1 836). Transcription of 
subgenomic RNA encoding the Go 19 scFv protein was initiated after infection at the 
indicated transcription start point. High levels of subgenomic RNA species were 

30 synthesized in virus- infected plant cells (M.H. Kumagai et al. (1993) Proc. Natl. Acad. 
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Sci. USA 90:427-430), and serve as templates for the translation and subsequent 
accumulation of Go 19 scFv protein. 

Characterization of clones 

Signs of infection were visible after 5-6 days as mild leaf deformation, with 
5 some variable leaf mottling and growth retardation. Eleven to fourteen days post 

inoculation, the secreted proteins were isolated. Approximately 0.1 g of infected leaf 
material was harvested, placed into 96-well glass fiber filtration block 
(Whatman/Polyfiltronics), submerged in infiltration buffer (20mM Tris HC1, pH 7.0 , 
lOmM 2-mercaptoethanol). The tissue is subjected to a 700 mm Hg vacuum for 30 

10 seconds, the vacuum released and the vacuum process is repeated at least one addition 

round. Residual buffer is removed by a low speed spin at 30 x g in a plate centrifuge. 
Secreted proteins (hereafter termed "interstitial fraction" or "IF") were recovered from 
infiltrated leaves by mild centrifugation at 1700 x g in a plate centrifuge and collected 
into a 96 well polypropylene plate. 

15 The secreted material was analyzed for the presence of soluble Go 19 scFv 

protein by SDS-PAGE. IF (27 ul containing approximately 5 ug of protein) was 
separated by SDS-PAGE. Linkers from individual clones were sequenced, analyzed for 
reading frame and amino acid content and then screened for protein expression in 
infected plants. Figure 3 shows the results of 22 individual Go 19 scFv expressing 

20 clones that demonstrated various levels of protein accumulation. Clones C5 and El and 

E9 showed high levels of expression with minimal protease degradation. 

From the sequence data, the linker sequences for individual clones were deduced 
as shown in Table 4. 
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Table 4. 



Analysis of select members of the Gol9 linker library experiment in whole plants 



Clone 


Linker Region Nucleotide Sequence (lower case) and 
Amino Acid Sequence (upper case) 


SEQID 
NO: 


Length 
(aa) 


RE* 


#C5 


Ggtgctggtggtggt 
G A G G G 


34 
35 


5 


*** 


if PI 0 


Af1" nnt" oat" aat aataotaataataataat 
TGGGGGSGGG 


36 
37 


10 


*** 


#C11 


Actactactactgctactactgctggtagtggtgct 
TTTTATTAGSGA 


38 
39 


12 


** 


#E1 


Gctagtactggtgct 
A S T G A 


40 
41 


5 


*** 


#E9 


Agtactggtagtagtggtgctggt 
STGSSGAG 


42 
43 


8 


*** 


#E3 


Gctagtagtggt-gctagtgct 
A S S G A S A 


44 
45 


7 


* 


#C4 


Gctagtggtggtactgctggtactggtggtagtagtact 
ASGGTAGTGGSST 


46 
47 


13 


** 


#E4 


Actagtggtagtggtgctagtgctgctgctggtggtgct 
TSGSGASAAAGGA 
gctgctagtgct 
A A S A 


48 
49 


17 


* 



* RE = Relative Expression to Gol9 scFv library clones 



5 As above, differences were observed in the expression of various Go 19 scFv-based 

clones in whole plants as well as the degree of degradation indicated by the presence of 
protein accumulation between the 6.5 kDa and 21 kDa marker bands. The methods 
disclosed for generating the linker regions with varying length and sequence permit the 
screening of large numbers of clones for their expression in either plant protoplast or 
10 whole plants. 

EXAMPLE 4 
scFv-Detectably Labeled Coniu£ates 

A mAb to HER-2/neu inhibits growth of cells of the breast cancer cell line 
15 SK-Br-3 (ATCC HTB 30) in 6 day culture. Such treatment sensitizes these cells to 

chemotherapeutic agents (US 5,677,171). 
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The process of Example 1 is repeated using a V H and V L regions of an scFv that 
specifically binds the HER-2/neu (erbB-2) protein. The scFv gene encoding such a 
polypeptide is described in Wels et al., Biotechnology J O A 128-1 132 (1992). Using the 
same repeated triplet nucleotide sequences as in Example 1, the 3' end of the erbB-2 
5 scFv DNA construct is linked to the 5' end of the horseradish peroxidase gene using 

appropriate PCR primers modeling the method in Example 1 . 

High yielding clones are identified by measuring for peroxidase activity in the 
supernatant. High affinity and avidity re determined by immunohistochemical 
detection, with substrate and chromophore on control samples of a breast cancer cell 
10 line that overexpresses HER-2/neu. Comparisons are made to conventional labeled 

mAbs to HER-2/neu (such as DAKO HercepTest, Dako Corp., Carpinteria, CA) to 
determine which clones produce acceptable scFv proteins. 



EXAMPLE 5 

15 scFv-Toxin Conjugate Production 

The process of Example 4 is repeated, with the following modification. The 
gene for the ricin A chain is linked to the 3' end of the scFv DNA construct through the 
linker region of this invention (made up of repeated triplet nucleotides). 

The plant cell clones are grown in 24 well plates and screened initially by 

20 measuring secreted protein (PAGE followed by Coomassie blue staining). Two day 

culture supernatants from the wells in which each clone is growing are tested for 
cytotoxic activity toward target cells by incubation with active cultures of SK-Br-3 in 
six well plates (Costar). Cytotoxicity against these targets is determined 48 hours later 
by microscopic inspection. 

25 High producing clones that generate strong cytotoxicity are selected. Calluses 

are formed from these cultures to regenerate plants for field growth and large scale 
production. 

Humanized mAb to HER-2/neu is an FDA approved therapeutic for breast 
cancer (HERCEPTIN, Genentech, Inc., South San Francisco, CA). It is expected that 



DC043399 



48 



Dkt: LSB-006 




toxin-conjugated scFv specific for the same antigen will be at least equally and probably 
more cytotoxic to human breast cancer cells. 

EXAMPLE 6 

5 Production of Dual-Domain Ribozvmes 

The process of Example 1 is repeated except that DNA encoding two different 
ribozyme domains is used. The vector that contains the subcloned dual ribozyme 
domains is transcribed to produce RNA with the properties of the respective ribozyme 
domains. 

1 0 The amount of transcribed RNA product can be determined by hybridization 

with an oligonucleotide probe, by spectrophotometric measurements, etc. The amount 
of activity of either ribozyme domain can be measured using the appropriate assay. 

EXAMPLE 6 

Production of Dual DNA Domains 

1 5 The process of Example 1 is repeated except that two different DNA are used, 

each of which binds a protein. The plasmid DNA can be produced in large amounts, 
and the dual DNA domain molecule can be excised with a restriction endonuclease. The 
resulting fragment has the two linked DNA domains and can be assayed for its ability to 
bind to a DNA binding protein {e.g.,, transcription factor, restriction endonuclease, 

20 polymerase, etc. 

The references cited above are all incorporated by reference herein, whether 
specifically incorporated or not. 

Having now fully described this invention, it will be appreciated by those skilled 
25 in the art that the same can be performed within a wide range of equivalent parameters, 

concentrations, and conditions without departing from the spirit and scope of the 
invention and without undue experimentation. 



DC043399 



49 



Dkt: LSB-006 



While this invention has been described in connection with specific 
embodiments thereof, it will be understood that it is capable of further modifications. 
This application is intended to cover any variations, uses, or adaptations of the invention 
following, in general, the principles of the invention and including such departures from 
the present disclosure as come within known or customary practice within the art to 
which the invention pertains and as may be applied to the essential features hereinbefore 
set forth as follows in the scope of the appended claims. 
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